CHAPTER 4: Endian encodings & word registers

CHAPTER 4.1: Endian encodings
  We should already have some quite exact idea about byte variables. You already
  know they are 8 bit large (not so important now) and that they can contain
  numeric value from 0 to 255. About word variables you know that they are 16
  bits long and they contain value 0 to 65535. 

  Either you see it or not - word is same size as two bytes. Now let's think
  about how to store value in two bytes. Both bytes can contain value 0 to 255.
  Combination of this, we get 256*256, that is 65536. But how is this value
  stored in these bytes? Let's say one of bytes (byte #1) contains 0. Then
  other byte (byte #2) can hold value 0 to 255. So now we store values from 0
  to 255 in our word. Now, when byte #1 contains 1, we can store another 256
  number, 256 to 511. When byte #1 contains 2 we can store another 256 number,
  512 to 767 etc. So totally it is 256*256, as i said, 65536. It is like in
  decimal numbers: every digit is value 0 to 9, and "true" value of digit
  depends on it's position. Last digit holds value 0 to 9, next (?previous?)
  digit hold 10*(0 to 9), next 100*(0 to 9) etc. It is same in words: One of
  bytes hold value 0 to 255, other holds value 256*(0 to 255). The one which
  holds 0..255 is called "low order byte", other (which holds 256*(0..255)) is
  called "high order byte".

  >>> term: low order byte, high order byte.

  Examples: (word value = high order byte : low order byte)
    0     = 0 : 0
    1     = 0 : 1
    255   = 0 : 255
    256   = 1 : 0
    257   = 1 : 1
    511   = 1 : 255
    512   = 2 : 0
    513   = 2 : 1       (513 div 256 = 2, 513 mod 256 = 1)
    65535 = 255 : 255   (65535 div 256 = 255, 65535 mod 256 = 255)
    
  Last problem remains: Order of these bytes. (eg: which is first, low order
  byte or high order byte?). This is different on different computers. On IBM
  PCs (and compatible) low order byte is first, high order byte comes then. For
  example:
    label variable
    dw 0
  then "byte [variable]" is low order byte and "byte [variable + 1]" is high
  order byte. (addition + 1 to offset in "variable" is done by compiler. It
  means next byte behind offset in "variable", i think this clear enough to need
  any more explaination). 
  
  NOTE: When low order byte is first then it is called "little endian encoding",
  when high order byte then it is called "big endian encoding", but these terms
  are not important, especially for beginner asm coder.

CHAPTER 4.2: Word registers

  Processor has except byte registers (like "al","ah","dl"...) some word
  registrs too, of course. You know, word is combination of two bytes, and this
  is same for registers. Word registers are combination of byte registers.
  First word registers we'll learn are "ax", "bx", "cx" and "dx". "ax" is
  combination of "al" and "ah". "al" is low order byte, "ah" is high order
  byte. Same for bx = bh:bl, cx = ch:cl, dx = dh:dl. If you would like
  "emulate" register "ex" in memory it will be: 
    label ex word 
    el db 0 
    eh db 0 
  "el" would be low order byte, so it is first.
  >>> term: word register, ax, bx, cx, dx

  NOTE: letters a,b,c,d stays for "accumulator", "base", "counter" and "data",
    it has nothing to do with alphabetical order. Real order of these registers
    is ax,cx,dx,bx but it is not important until you want to generate/change
    machine code yourself.

  Now, if you want to set value in register ax to 52 you use
    mov ax,52
  but you also could use
    mov al,52
    mov ah,0
  or setting "dx" to 12345
    mov dx,12345
  but it could be (no reason to do it this way)
    mov dh,48
    mov dl,57
  because 48 is equal to 12345 / 256, 57 is 12345 modulo 57 (modulo is remainder
  from division).

  NOTE: You know that instruction operand can be number (numeric constant), like
  "0", "256", "12345" etc. But every assembler i know allows you to put some
  expression as statement. During compilation value of expression is evaluated
  and expression is "replaced" by it's result. So "mov dx,(1 + 5)" is same as
  "mov dx,6". Or better, code that is upwards can be writen as
    mov dh,12345 div 256
    mov dl,12345 mod 256
  ("div" is operator for division, "mod" is operator which returns remainder
  from division (modulo). You don't have to know these operators now, anyway
  you should already know something about expressions)

  Processor has also other word registers, "sp", "bp", "si", "di". But you can't
  directly access byte parts of this registers, you must access whole word. For
  example if you want set high order byte of "si" to 17 you must (?) do it this
  way:
    mov ax,si
    mov ah,17
    mov si,ax
  So first you copy value of "si" to "ax". High order byte of "ax" can be
  dirctly accessed (it is "ah" register) so set it. Low order word remains. Then
  copy value back from "si" to "ax". High order word is changed, low order word
  remains unchanged.

  NOTE: register "sp" is always has special function, "bp" usually has special
  function (in code generated by most (?all?) non-assembly compilers). Registers
  "si" and "di" can be used whenever you want.

CHAPTER 4.3: String output using int 21h/ah=9

  This should be part of chapter 3 about addresses, but you need to know "dx"
  register which is explained here. I warned you this tutorial is unusual. 

  Here we will talk about another usage of "int 21h". You already should know
  that when "ah" contains 2 then "int 21h" writes character in "dl". But if we
  want to display some longer text we must set "dl" for every char and this is
  bad method. Wouldn't it be better if we just store string we want to display
  somehere in file and then just display it from here? 
  
  For this we can use "int 21h" with value 9 in "ah" and address of string in
  "dx" register. Something like:
    mov ah,9
    mov dx,address_of_string
    int 21h
    
  But another problems comes out - how to determine length of string, eq. number
  of characters to display. There are more methods about this, we will talk 
  about simplest one, used by int 21h/ah=9. There is just some special character
  reserved as end-of-string marker. For int 21h/ah=9 it is character "$". So to
  store string "Hello World", you define "Hello World$", where "$" means end of
  string. Example of displaying string:
    org 256
    mov ah,9
    mov dx,text_to_display
    int 21h
    int 20h
    label text_to_display
    db 'Hello World$'
  This program will display "Hello World".

  This method of marking end of string has limitation - you can't display
  character "$". For example:
    org 256
    mov ah,9
    mov dx,text_to_display
    int 21h
    int 20h
    label text_to_display
    db 'It costed 50$, maybe more$'
  will of course display only "It costed 50". This case can be solved this way:
    org 256
    
    mov ah,9
    mov dx,text1
    int 21h
    
    mov ah,2
    mov dl,'$'
    int 21h
    
    mov ah,9
    mov dx,text2
    int 21h
    
    int 20h
    
    label text1
    db 'It costed 50$'
    label text2
    db ', maybe more$'
  first part (first "int 21h") will write "It costed 50", then int 21h/ah=2 will
  write "$" and second int 21h/ah=9 will write ", maybe more". We won't care
  about this limitation anymore for now, this was just to improve explaination.

  Deeper about int 21h/ah=9. As you maybe already realized, this will display
  every character (exact: every character whose ASCII code is in byte) from 
  address in "dx" to first character "$" behind address in "dx".

  NOTE: ASCII codes 0 to 31 (i think) have special meaning for int 21h/ah=9.
  These codes have characters assigned to them (smiling faces, diamonds etc.),
  but int 21h/ah=9 doesnt display them but does something other. For example
  character with ascii code 7 will case it to beep for a short while. Try this:
    org 256
    mov ah,9
    mov dx,text
    int 21h
    int 20h
    label text
    db 'Beep',7,'$'
  It should write "Beep" and then beep

  Another common values are 10 and 13. 10 cases cursor to return to first column
  of current row. 13 causes cursor to move one row down (if bottom of screen is
  reached then screen is scrolled). So combination of this causes cursor to move
  to first column of next row. These two should (but doesn't always) work in any
  order, but you always should put 13 first. These two characters are often
  called EOL (end of line). Try this example:
    org 256
    mov ah,9
    mov dx,text
    int 21h
    int 20h
    label text
    db 'Line 1',13,10,'Line 2$'
  it should write:
    Line 1
    Line 2
  
  NOTE: ASCII code 13 is called CR (carriage return) and code 10 is called LF
  (line feed).
